Skip to content

Pin python in ci integration tests to prevent intermitent codspeed segfaults in walltime#105

Merged
GuillaumeLagrange merged 1 commit intomasterfrom
cod-2199-pytest-codspeed-segfaults-in-walltime
Feb 6, 2026
Merged

Pin python in ci integration tests to prevent intermitent codspeed segfaults in walltime#105
GuillaumeLagrange merged 1 commit intomasterfrom
cod-2199-pytest-codspeed-segfaults-in-walltime

Conversation

@GuillaumeLagrange
Copy link
Contributor

@GuillaumeLagrange GuillaumeLagrange commented Feb 6, 2026

All observed crashes occured with 3.14.3, although it did not crash all the time.
Pin for now to unblock other PRs, and investigate later

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 6, 2026

Merging this PR will degrade performance by 18.32%

⚡ 10 improved benchmarks
❌ 14 (👁 14) regressed benchmarks
✅ 143 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime test_fs_read[1000] 857.1 ns 795.3 ns +7.77%
👁 WallTime test_threadpool_map[10] 2.5 ms 2.6 ms -2.06%
WallTime test_multiprocessing_map[10000] 101.3 ms 82.1 ms +23.32%
👁 WallTime test_tcp_connection[google.com-443] 1.1 ms 1.2 ms -2.44%
WallTime test_iir_filter_process 3.2 µs 3.1 µs +3.09%
👁 WallTime test_multiprocessing_map[1000] 57 ms 69.8 ms -18.32%
👁 WallTime test_noop_lambda_decorated 1.3 µs 1.4 µs -8.62%
👁 WallTime test_recursive_fibo_20 5.5 ms 5.8 ms -5.28%
👁 WallTime test_combination_sum[candidates0-8] 11.8 µs 12.1 µs -2.71%
👁 WallTime test_open_knight_tour[1] 4 µs 4.1 µs -2.94%
👁 WallTime test_tcp_connection[1.1.1.1-53] 834.6 µs 1,001.5 µs -16.67%
WallTime test_multiprocessing_map[100] 54.2 ms 53.1 ms +2.01%
👁 WallTime test_noop_pass 483.4 ns 506.5 ns -4.56%
👁 WallTime test_make_highshelf 7.7 µs 7.9 µs -2.58%
👁 WallTime test_make_bandpass 5.9 µs 6.1 µs -3.54%
WallTime test_array_alloc[100] 1.2 µs 1.1 µs +7.91%
WallTime test_sum_of_squares[sum_of_squares_sum_comprehension_power] 227.1 µs 218.2 µs +4.09%
WallTime test_noop_pass_decorated 748 ns 729 ns +2.61%
WallTime test_make_highpass 5.9 µs 5.8 µs +2.67%
👁 WallTime test_multiprocessing_map[100000] 192.2 ms 231.8 ms -17.08%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing cod-2199-pytest-codspeed-segfaults-in-walltime (c3a194a) with master (adee8a1)

Open in CodSpeed

@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-2199-pytest-codspeed-segfaults-in-walltime branch 4 times, most recently from 6dca575 to a618ede Compare February 6, 2026 14:42
@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-2199-pytest-codspeed-segfaults-in-walltime branch from a618ede to c3a194a Compare February 6, 2026 14:49
@GuillaumeLagrange GuillaumeLagrange marked this pull request as ready for review February 6, 2026 15:05
@GuillaumeLagrange GuillaumeLagrange changed the title Pytest codspeed segfaults in walltime Pin python in ci integration tests to prevent intermitent codspeed segfaults in walltime Feb 6, 2026
@edgarrmondragon
Copy link
Contributor

FWIW Python 3.13.12 is similarly affected:

Works well with 3.13.11, so I pinned to the patch version and opened that PR to confirm. Should I create an issue?

@GuillaumeLagrange
Copy link
Contributor Author

GuillaumeLagrange commented Feb 6, 2026

@edgarrmondragon Thank you very much for the report, I have created the issue where we'll clarify. Please let us know if you find other versions that are affected

@GuillaumeLagrange GuillaumeLagrange merged commit c3a194a into master Feb 6, 2026
36 checks passed
@GuillaumeLagrange GuillaumeLagrange deleted the cod-2199-pytest-codspeed-segfaults-in-walltime branch February 6, 2026 15:44
cheeeee pushed a commit to cheeeee/bytewax that referenced this pull request Mar 6, 2026
All bytewax Py<T> references are properly guarded with SafePy (15
struct fields, 5 Drop impls, PICKLE_MODULE static). The SIGSEGV comes
from pytest-codspeed's walltime profiler shutdown conflicting with
Python 3.14 finalization (CodSpeedHQ/pytest-codspeed#105). Benchmark
data is captured before the shutdown crash, so catch exit 139 and
emit a warning instead of failing the job.
cheeeee added a commit to cheeeee/bytewax that referenced this pull request Mar 6, 2026
* Fix benchmark workspace permission errors on self-hosted runners

* Upgrade actions to latest versions, improve caching and sccache

- Upgrade checkout v3→v6, cache v4→v5, upload-artifact v4→v7,
  download-artifact v4→v8, setup-just v2→v3, dawidd6 v6→v16
- Add restore-keys to all Cargo caches for warm restarts
- Add cache-dependency-glob to all setup-uv calls
- Set CARGO_INCREMENTAL=0 for sccache compatibility
- Add retention-days: 7 to CI wheel artifacts

* Bump Rust 1.74.1 → 1.85.0, update dependencies and Dockerfile

- Rust toolchain: 1.74.1 → 1.85.0
- Widen serde/tokio/fastrand Cargo.toml pins, cargo update resolves
  174 packages (tokio 1.50, serde 1.0.228, chrono 0.4.44, etc.)
- pre-commit-hooks: v4.4.0 → v5.0.0
- Dockerfile: rust:1.68-bullseye → 1.85-bookworm, distroless
  debian11 → debian12, install maturin via pip instead of obsolete
  konstin2/maturin:v0.12.6 image

* Fix repo-checks OOM, pin benchmarks to Intel runner

- repo-checks: replace `uv pip sync -e .` (OOM during in-process
  cargo build) with maturin-action wheel build + filtered dev deps
  install. Adds sccache, sudo wrapper, workspace permission fix.
- benchmarks: add Intel label to runs-on for perf consistency

* Fix maturin-action directory collision and uv hardlink issues

- Clean stale /__w/_temp/run-maturin-action.sh before builds
  (new runners had a directory at this path, causing Docker bind
  mount failures: "is a directory: permission denied")
- Set UV_LINK_MODE=copy for container jobs to avoid hardlink
  failures across overlay filesystem boundaries (prevents cache
  corruption like missing wheel METADATA)

* Fix concurrent maturin collisions, sccache timeout, uv cache corruption

- Isolate maturin temp dir per matrix job (RUNNER_TEMP unique path)
  to prevent concurrent jobs on the same runner from colliding on
  /__w/_temp/run-maturin-action.sh
- Disable sccache for repo-checks (server can't start in containers)
- Retry uv pip sync with cache clean on failure to recover from
  corrupted wheel metadata in shared uv cache

* Fix maturin-action stale directory cleanup for cross-compile jobs

Replace broken RUNNER_TEMP env var override (resolved at parse time,
not runtime) with targeted cleanup that only removes the maturin temp
script path when it's a stale directory from a previous crashed job.
Remove unnecessary isolation step from benches.yml (x86_64-only builds
don't use Docker).

* Ensure /__w symlink on every runner for cross-compile Docker builds

maturin-action creates the build script at $RUNNER_TEMP but Docker
bind-mounts it via /__w/_temp/ path. The /__w -> /opt/actions-runner/_work
symlink was only created by pre-pull (on one runner). Cross-compile jobs
on other runners fail because Docker can't resolve the path and auto-creates
a directory instead. Fix by ensuring the symlink exists on each runner
before the maturin-action step.

* Fix SIGSEGV in Python 3.13/3.14: wrap PICKLE_MODULE static in SafePy

The static GILOnceCell<Py<PyModule>> at pyo3_extensions.rs:17 was the
only bare Py<T> in a global/static context not wrapped in SafePy. During
Python 3.13+ interpreter finalization, its drop calls Py_DECREF on an
already-freed type object, causing SIGSEGV (exit 139). Wrapping in
SafePy<PyModule> checks Py_IsFinalizing() before dropping.

* Handle CodSpeed walltime SIGSEGV on Python 3.13+ benchmarks

All bytewax Py<T> references are properly guarded with SafePy (15
struct fields, 5 Drop impls, PICKLE_MODULE static). The SIGSEGV comes
from pytest-codspeed's walltime profiler shutdown conflicting with
Python 3.14 finalization (CodSpeedHQ/pytest-codspeed#105). Benchmark
data is captured before the shutdown crash, so catch exit 139 and
emit a warning instead of failing the job.

---------

Co-authored-by: Nick Bozhenko <nick.bozhenko@lotusflare.com>
cheeeee added a commit to cheeeee/bytewax that referenced this pull request Mar 6, 2026
* Fix benchmark workspace permission errors on self-hosted runners

* Upgrade actions to latest versions, improve caching and sccache

- Upgrade checkout v3→v6, cache v4→v5, upload-artifact v4→v7,
  download-artifact v4→v8, setup-just v2→v3, dawidd6 v6→v16
- Add restore-keys to all Cargo caches for warm restarts
- Add cache-dependency-glob to all setup-uv calls
- Set CARGO_INCREMENTAL=0 for sccache compatibility
- Add retention-days: 7 to CI wheel artifacts

* Bump Rust 1.74.1 → 1.85.0, update dependencies and Dockerfile

- Rust toolchain: 1.74.1 → 1.85.0
- Widen serde/tokio/fastrand Cargo.toml pins, cargo update resolves
  174 packages (tokio 1.50, serde 1.0.228, chrono 0.4.44, etc.)
- pre-commit-hooks: v4.4.0 → v5.0.0
- Dockerfile: rust:1.68-bullseye → 1.85-bookworm, distroless
  debian11 → debian12, install maturin via pip instead of obsolete
  konstin2/maturin:v0.12.6 image

* Fix repo-checks OOM, pin benchmarks to Intel runner

- repo-checks: replace `uv pip sync -e .` (OOM during in-process
  cargo build) with maturin-action wheel build + filtered dev deps
  install. Adds sccache, sudo wrapper, workspace permission fix.
- benchmarks: add Intel label to runs-on for perf consistency

* Fix maturin-action directory collision and uv hardlink issues

- Clean stale /__w/_temp/run-maturin-action.sh before builds
  (new runners had a directory at this path, causing Docker bind
  mount failures: "is a directory: permission denied")
- Set UV_LINK_MODE=copy for container jobs to avoid hardlink
  failures across overlay filesystem boundaries (prevents cache
  corruption like missing wheel METADATA)

* Fix concurrent maturin collisions, sccache timeout, uv cache corruption

- Isolate maturin temp dir per matrix job (RUNNER_TEMP unique path)
  to prevent concurrent jobs on the same runner from colliding on
  /__w/_temp/run-maturin-action.sh
- Disable sccache for repo-checks (server can't start in containers)
- Retry uv pip sync with cache clean on failure to recover from
  corrupted wheel metadata in shared uv cache

* Fix maturin-action stale directory cleanup for cross-compile jobs

Replace broken RUNNER_TEMP env var override (resolved at parse time,
not runtime) with targeted cleanup that only removes the maturin temp
script path when it's a stale directory from a previous crashed job.
Remove unnecessary isolation step from benches.yml (x86_64-only builds
don't use Docker).

* Ensure /__w symlink on every runner for cross-compile Docker builds

maturin-action creates the build script at $RUNNER_TEMP but Docker
bind-mounts it via /__w/_temp/ path. The /__w -> /opt/actions-runner/_work
symlink was only created by pre-pull (on one runner). Cross-compile jobs
on other runners fail because Docker can't resolve the path and auto-creates
a directory instead. Fix by ensuring the symlink exists on each runner
before the maturin-action step.

* Fix SIGSEGV in Python 3.13/3.14: wrap PICKLE_MODULE static in SafePy

The static GILOnceCell<Py<PyModule>> at pyo3_extensions.rs:17 was the
only bare Py<T> in a global/static context not wrapped in SafePy. During
Python 3.13+ interpreter finalization, its drop calls Py_DECREF on an
already-freed type object, causing SIGSEGV (exit 139). Wrapping in
SafePy<PyModule> checks Py_IsFinalizing() before dropping.

* Handle CodSpeed walltime SIGSEGV on Python 3.13+ benchmarks

All bytewax Py<T> references are properly guarded with SafePy (15
struct fields, 5 Drop impls, PICKLE_MODULE static). The SIGSEGV comes
from pytest-codspeed's walltime profiler shutdown conflicting with
Python 3.14 finalization (CodSpeedHQ/pytest-codspeed#105). Benchmark
data is captured before the shutdown crash, so catch exit 139 and
emit a warning instead of failing the job.

---------

Co-authored-by: Nick Bozhenko <nick.bozhenko@lotusflare.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants